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Abstract 


A v4Tsion of the Goddard Institute for Space Studies 
Atmospheric General Circulation Model wns vectorized to run 
on a CDC STAR-100. The numerical model was coded and run in 
two different vector languages, CDC STAR FORTRAN and LRLTRAN. 

A factor of 10 speed improvement over an IBM 360/95 was realized. 
Efficient use of the STAR machine required some re-designing of 
algorithms and logic. This seems to preclude the application of 
vectorizing compilers on the original scalar code to achieve the 
same results. Although vector languages have not yet reached 
maturity, they permit a more natural and efficient formulation 
for such numerical codes. 


INTRODUCTION 


This paper presents the experience gained from converting 
the Goddard Institute for Space Studies' (GISS) general circulation 
weather model to run efficiently on a CDC STAR-100. The conversion 
effort involved both reorganization and redesign of the original 
scalar code and the use of two vector languages : CDC STAR FORTRAN 

and LRLTRAN. Included are timing comparisons and suggestions for 
improving the usability of vector machines. The result of the con- 
version of this model supports the suggestion by Stone that sim- 
ple# mechanical adaptations of serial algorithms are not necessarily 
efficient on non-serial machines. 

THE MODEL 

The particular model used was a coarse grid version of the 
nine layer GISS general circulation model which in turn was 
derived from the three layer weather model developed by Arakawa and 
Mintz at U.C.L.A. This model performs, a time integration of the 
"primitive equations” on a regular cylindrical grid with different 
time steps for each of the three principal sections: dynamics 

(Winds# etc.)# physics (transport equations)# and radiation. The 
coarse grid consists of 24 meridians each of which contains 16 
latitude points. Although the number of grid points is signifi- 
cantly smaller than current production versions of the model# it 
was felt that this representation was adequate to perform the con- 
version economically. At GISS# the model is programmed in FORTRAN 
IV and runs on an IBM 360/95 (a faster version of the 360/91) . 

CONVERSION TO THE STAR 

* * 

There were several general considerations in the 
conversion effort. Foremost wa^ that although the STAR vector 


pipeline is fast, scalar operations are notj consequently 
sections of the program had to be substantially reorganized 
to reduce the ratio of scalar to vector code. In fact, many 
scalar quantities were maintained in vector form, expanded over 
the entire grid. Moreover, small arrays were expanded where 
appropriate into longer vectors to make full use of the streaming 
efficiency of the vector hardware. 

Software compatibility between the two vector languages 
available dictated the use of singly-dimensioned arrays for all 
vector operations. Furthermore, the storage order of all vec- 
tors was altered so that longitudinal points for each latitude 
were in contiguous storage. This was done primarily because the 
dynamics section contains different computations for polar and 
non-polar points. The result of this reorganization was that 
the different computations could be performed on continuous as 
opposed to scattered storage. 

A second consideration was that several scalar algorithms 
did not lend themselves to direct vectorization and had to be 
rewritten. Circularity in the longitudinal computations is a 
recurring problem in the dynamics section. This was resolved by 
introducing two extra longitudinal points at ’each latitude thus 
creating an artificial wrap-around. 

Another instance of algorithm redesign involved weighted 
sums of products of quantities defined at neighboring gridpoints. 
By reordering the computation, weighx;ed sums of the individual 
quantities could be computed and the products formed afterwards. 
The latter method is more easily vector! zable’ than the former. 


A simulation of the vectorized code was conducted in . 
scalar FORTRAN to test the validity of the reorganized logic and 
the modified algorithms. 

LANGUAGES AND CODING TECHNIQUES 

Since the numerical model is considered to be a research 
tool, coding proceeded under the assumption that it should be 
possible to easily make future changes while maintaining read- 
ability. Consequently, it was decided that a high level language 
would be used throughout. 

Two STAR vector languages were selected: .(1) LRLTRAN, 

written at the Lawrence Livermore Laboratory (LLL) , University 
of California, and (2) CDC's STAR FORTRAN, a standard program 
product. LRLTRAN is a programming language derived from FORTRAN 
containing most of the features of FORTRAN IV and, in addition, 
vector extensions 

Both languages offer an explicit vector syntax which 
permits the direct use of the STAR vector hardware. Even 
though both languages contain a syntax for embedded machine 
instructions (Q8INLINE) , our readability constraint precluded 
their use. This constraint was slightly relaxed when all oper- 
ations were coded as explicit dyads. This technique was intro- 
duced in order to prevent the .generation of unnecessary temporary 
vectors by the compilers. Whereas LRLTRAN contains a more concise 
notation for vector addressing, it requires that all vectors be 
explicitly declared, singly-dimensioned, and have a starting 
index of zero. This tends to create some confusion when 
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EQUIVALENCE statements are used to map vectors onto other vectors 
or multiply-dimensioned scalar arrays. STAR FORTRAN, on the 
other hand, adheres more closely to ANSI FORTRAN standards. It 
allows multiply-dimensioned arrays to be used as vectors without 
an explicit declaration. Although STAR FORTRAN tends to be more 
verbose, the overall flexibility offered by LRLTRAN introduces 
additional "overhead** which naturally leads to longer execution 
times. 

STAR EXECUTION AND TIMINGS 

The dynamics section was coded in both languages, whereas 
the physics section was written only in STAR FORTRAN. This 
paper discusses primarily those two sections, as vectorization of 
the radiation section is not yet complete. Several runs of the 
model were made at three separate installations. At LLL, the 
LRLTRAN version of the dynamics was used, while the entire code 
in STAR FORTRAN was run at CDC*s Data Center in Arden Hills, 
Minnesota. The timings for the original scalar code were 
measured on the 360/95 at GISS. A comparison of the different 
timings is presented in Tables I and II. The scalar code for 
the 360/95 was compiled under FORTRAN H Extended Plus, Optimiza- 
tion Level 2, and used 64 bit floating point arithmetic. The 
vector code was compiled under STAR FORTRAN version 2.0, cycle 
115P without optimization. 

As seen in Table I, the speed improvement for one call 
to the dynamics sections is almost twice that obtained from one 
call to the physics section. This is because the dynamics 
algorithms lend themselves more readily to vectorization. A run 
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of the model for 14 simulated days yielded a net speedup factor 
of 10 . 23 . While this figure includes several sources of overhead 
such as I/O and housekeeping, the effect of the slower physics 
is offset by the fact that the dynamics routines are called six 
timm as often. A comparison of the dynamics written in LRLTRAN 
(CHAT STAR version 98D) versus STAR FORTRAN (Table II) confirms 
that the increasf>d flexibility offered by LRLTRAN is achieved 
with some loss of performance. 

OBSERVATIONS AND RECOMMENDATIONS 

As a result of our experiences, several observations can 
be made. Vector codes generally require more memory than their 
scalar counterparts, primarily because intermediate quantities 
are now vectors rather than scalars. Since vector machines offer 
greater speeds, solving larger problems and increasing the 
resolution of current problems become feasible. It appears, 
therefore, that large main storage configurations are desirable, 
if not necessary. Experience with a similar model on ILLIAC 
IV^^^ confirms that the performance potential of a fast machine 
may not be fully realized when the user must* manage his own 
peripheral storage as an extension of main storage. This 
problem is solved on the STAR-100 by the use of a virtual memory 
operating system and peripheral stations which perform most 
of the detailed input/output-related computing functions. The 
combination of large main memory and an efficient virtual 
operating system, then, seems to be the best way to utilize a 
fast parallel or vector machine. 


It is critical that the underlying input/output control 
software handles the user I/O requests specified in higher level 
language statements as efficiently as it handles system I/O 
requests. In this respect and in spite of the available I/O 
facilities, unformatted FORTRAN I/O. for this model on the STAR 
was found to be a factor of 3.5 times slower than on the 360/95. 

Not only are machine/systems improvements necessary, but 
vector languages and compilers are also underdeveloped. As 
previously mentioned, both LRLTRAN and STAR FORTRAN suffer from 
a rigidity of notation which tends to make both languages more 
verbose. A combination of features offered by both languages 
would appear to be more appealing to the user. Code optimization, 
error detection, and debugging facilities offered by these compilers 
are not as sophisticated as one would expect on such a powerful 
machine as the STAR-100. 

In view of the substantial program redesign that was necessary 
to produce an efficient vector code, there are serious doubts about 
the effectiveness of a simple, mechanical translation of serial code. 
In fact, we agree with Stone^^^ that vectorizing compilers which 
would produce such translations would be "..'.stop gaps at best”. 

CONCLUSION 

Since it appears that a vector formulation is a more 
"natural” expression of the problem, it would seem that as 
vector machines become more prevalent, increasing numbers of 
algorithms will be written directly in vector form. Therefore, 
it is our opinion that vectorizers now have and will continue 




to have only limited utility. Thui, the greater part of future 
effort in software development should be directed towards the 
vector languages themselves. 
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TABLE I. MACHINE COMPARISONS 



360/95 (GISS) secff^ 

STAR(CDC) secs. 

95: STAR 

Dynamics* 

0.65 

0.056 

11.61:1 

Physics* 

0.41 

0.070 

5.85:1 

14-day Run 

1472.31 

143.92 

10.23:1 


TABLE II. LANGUAGE COMPARISONS 



LRLTRAN(LLL) sees. 

STAR FORTRAN (CDC) 

secs. 

LLL:CDC 

Dynamics* 

0.085 

0.056 

1.52:1 


*Timings are foz a single call 
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